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Abstract 

Background: Boar taint is an offensive urine or faecal-like odour, affecting the smell and taste of cooked pork from 
some mature non-castrated male pigs. Androstenone and skatole in fat are the molecules responsible. In most pig 
production systems, males, which are not required for breeding, are castrated shortly after birth to reduce the risk 
of boar taint. There is evidence for genetic variation in the predisposition to boar taint. 

A genome-wide association study (GWAS) was performed to identify loci with effects on boar taint. Five hundred 
Danish Landrace boars with high levels of skatole in fat (>0.3 [ig/g), were each matched with a litter mate with low 
levels of skatole and measured for androstenone. DNA from these 1,000 non-castrated boars was genotyped using 
the lllumina PorcineSNP60 Beadchip. After quality control, tests for SNPs associated with boar taint were performed on 
938 phenotyped individuals and 44,648 SNPs. Empirical significance thresholds were set by permutation (100,000). For 
androstenone, a 'regional heritability approach' combining information from multiple SNPs was used to estimate the 
genetic variation attributable to individual autosomes. 

Results: A highly significant association was found between variation in skatole levels and SNPs within the CYP2E1 
gene on chromosome 14 (SSC14), which encodes an enzyme involved in degradation of skatole. Nominal significance 
was found for effects on skatole associated with 4 other SNPs including a region of SSC6 reported previously. 
Genome-wide significance was found for an association between SNPs on SSC5 and androstenone levels and 
nominal significance for associations with SNPs on SSC13 and SSC17. The regional analyses confirmed large effects on 
SSC5 for androstenone and suggest that SSC5 explains 23% of the genetic variation in androstenone. The autosomal 
heritability analyses also suggest that there is a large effect associated with androstenone on SSC2, not detected using 
GWAS. 

Conclusions: Significant SNP associations were found for skatole on SSC14 and for androstenone on SSC5 in Landrace 
pigs. The study agrees with evidence that the CYP2E1 gene has effects on skatole breakdown in the liver. Autosomal 
heritability estimates can uncover clusters of smaller genetic effects that individually do not exceed the threshold for 
GWAS significance. 
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Background 

Boar taint is an offensive urine or faecal-like odour, affect- 
ing the smell and taste of some cooked pork. Androste- 
none and skatole, which are lipophilic compounds that 
accumulate in the fat of mature non-castrated male pigs, 
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have been identified as the main causes of boar taint [1]. A 
range of thresholds, above which negative reactions from 
consumers are expected, have been reported for androste- 
none (>0.5-1.0 |ig/g fat) and skatole (>0.2-0.25 [ig/g fat) 
[2-6]. The scale of the problem was revealed in a large EU 
study of carcasses from over 40,000 non-castrated male 
pigs. Androstenone levels exceeded 1.0 [ig/g fat and skatole 
levels exceeded 0.25 |ig/g fat in 30% and 11% of these car- 
casses, respectively [3]. The cost of testing, losses in carcass 
value and potential future lost sales result in a substantial 
economic cost to the industry. 
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Androstenone or 5a-androst-16-en-one is a male ster- 
oid produced in the testes at sexual maturity. High con- 
centrations of androstenone are present in the saliva of 
male pigs where it is converted to a pheromone and is 
an important olfactory trigger for sexual behaviour in 
sows [7]. Androstenone accumulates in adipose tissue 
producing taint when the fat is heated. The ability to de- 
tect this taint is itself under genetic control in humans and 
largely governed by the OR7D4 receptor. Approximately 
70% of the human population are unable to detect the as- 
sociated urine like odour [8,9]. Skatole or 3-methyl-indole 
is produced from the breakdown of tryptophan by bacteria 
in the hindgut of the pig and subsequently absorbed into 
the blood stream where it is largely metabolised in the liver 
and excreted in urine. Skatole which is not degraded in the 
liver is deposited in peripheral tissues mainly accumulating 
in adipose tissue. 

The most effective solution, to date, for controlling 
boar taint, is surgical castration shortly after birth. How- 
ever, as castration removes natural anabolic androgens 
that promote lean growth, non-castrates are leaner with 
10-30% greater efficiency in feed conversion and super- 
ior meat quality. Furthermore, concerns over animal wel- 
fare have led to legislative control [10]. Within Europe an 
industry-wide agreement is in place to cease castration for 
welfare reasons by 2018 (http://ec.europa.eu/food/animal/ 
welfare/farm/initiatives_en.htm), forcing the industry to ex- 
plore other methods to prevent tainted carcasses. 

Selective breeding based on the identification and ex- 
ploitation of genetic variation in androstenone and skatole 
levels could ultimately provide a more sustainable solution 
[11]. Recent studies have revealed Quantitative Trait Loci 
(QTL) with effects on skatole or androstenone, including 
QTL mapped to almost every chromosome [11-18]. The 
genetic architecture of predisposition to boar taint shows 
evidence for inter- and intra-breed variation with many 
of the reported effects appearing to be breed specific 
[11,16-19]. In general, Duroc pigs tend to have high 
levels of androstenone, and the Landrace breeds high 
levels of skatole. The relationship between the two com- 
pounds is complex. Testicular steroids have been shown 
to inhibit the brealcdown of skatole in the liver but the re- 
lationship between the compounds and the underlying 
mechanisms are not well understood [20] . 

Although highly successful at identifying new trait asso- 
ciated loci and pathways, human genome-wide association 
studies (GWAS) have failed to capture a large proportion 
of the genetic variation in complex traits [21,22]. To ad- 
dress this so-called 'missing heritability' gap, methods have 
been developed involving the analysis of larger regions of 
the genome to account for variation unexplained by ana- 
lysis of individual single nucleotide polymorphisms (SNPs) 
[23]. Estimating local heritability using larger regions cap- 
tures additive variation in the genome which might elude 



the stringent significance thresholds necessary for test- 
ing each SNP individually. It has also been suggested 
that rare variants not in complete linkage disequilib- 
rium (LD) with common SNP markers are captured by 
estimating the genetic variation from an entire "region" 
or set of SNPs [24]. 

The objective of this study was to identify genomic re- 
gions with effects on boar taint in Landrace pigs. 

Results are reported from the two approaches used: 
(i) single SNP analysis using genome-wide association, 
and (ii) a regional approach dividing SNPs by chromo- 
some and estimating genetic variation attributable to 
each autosome. 

Results 

We acquired data for a population of approximately 
6,000 commercial Danish Landrace boars. The animals 
were slaughtered at a mean age of 160 (±13) days. Mea- 
sures for skatole were taken using an in-line procedure 
at three Danish abattoirs. Power to detect a QTL can be 
increased in a finite sample by selecting those individuals 
that differ most from the phenotypic mean i.e. the ex- 
tremes of the phenotypic distribution. Here, we took ex- 
treme animals plus a within-litter 'control' in order to 
maximize power while controlling for family stratifica- 
tion. This strategy maximises the potential genetic in- 
formation to be gained from the sample [25,26]. Thus, 
500 boars with high skatole (>0.3 \ig/g fat) at slaughter, 
each matched with a low skatole litter mate (the lowest 
in the litter and in any event below 0.3 |ig/g) were se- 
lected for genome-wide analysis. Phenotypic measurements 
for androstenone in adipose tissue were subsequently 
collected for these selected 1,000 boars. 

The measures for both skatole and androstenone were 
positively skewed and were log transformed prior to ana- 
lysis (Additional file 1: Figure SI). Descriptive statistics 
and heritabilities for both traits are given in Table 1. 
Pedigree information and skatole measures were avail- 
able for 5,000 boars from the initial population that were 
not selected for genotyping and genome-wide analyses. 
Narrow sense heritabilities estimated from pedigree rela- 
tionships hpedigree (LM 1) usiug all 6,000 records for ska- 
tole and 1,000 records for androstenone were moderate 
at 0.39 (s.e. 0.03) and 0.52 (s.e. 0.09) respectively and 
were similar to those previously reported [16,27]. The 
genomic heritability estimate of 0.07 (s.e. 0.01) for ska- 
tole in the selected individuals was very low (Table 1). 
This result was expected and reflects the experimental 
design as the selected individuals comprised phenotypic- 
ally divergent sibs for skatole thus maximising the within 
family variance. Narrow sense heritability is based on a 
ratio of the between and within family variance and is 
therefore reduced (and was similarly reduced in the 
pedigree based estimate using only the 1,000 genotyped 
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Table 1 Descriptive statistics for skatole and 
androstenone 





Skatole 


Androstenone 


Mean (ng/g) 


0.32 


1.05 


Sd {[ig/g) 


0.24 


0.93 


Range (ng/g) 


0.02-249 


0.06-9.23 


Effect of slaughter weight 
(sig of effects 


-0.0089 (343 E-05) 


-0.023 (0.0007) 


Effect of meat percentage 
(sig of effect)'*' 


-0.048 (9.1 E-12) 


-0.026 (0.06) 


hpedigree (se) 


0.39 (0.03)''^ 


0.52 (0.09) 


snp be) 


0.07 (0.01) 


0.35 (0.08) 



Data from 938 progeny of 128 sires and 441 dams. 
^Covariate effects estimated in LMIVl using log trait. 

hpedigree refers to narrow sense heritability estimated in a linear mixed model 
using GRIVl estimated from pedigree relationships. 

^^Narrow sense heritability estimated for skatole using pedigree relationships 
from 6000 individuals. 

hii.JP refers to narrow sense heritability estimated in a linear mixed model 
using GRM estimated from SNP genotypes. 

individuals (not shown)). Comparing variance compo- 
nents estimated from the unselected and selected popu- 
lations provides an indication of how effects estimated 
in the selected sample would scale to the population as a 
whole. Mean skatole measures for selected boars and 
their litter mates were 0.48 (sd. 0.25) and 0.15 (sd. 0.06) 
[ig/g respectively. Although data were selected for ska- 
tole, androstenone measures also differed slightly (but 
not significantly) between the two groups with a mean 
of 1.25 (sd. 1.0) \ig/g in the high skatole animals, and 
0.85 (sd. 0.77) i^g/g in their low skatole litter mates. The 
estimated genetic correlation between skatole and andros- 
tenone in the selected data was 0.27 (s.e. 0.20). Because 
the estimate of the additive genetic variance in skatole is 
biased downwards in the genotyped subset, the genetic 
correlation between skatole and androstenone is also likely 
to be underestimated. 

Genome-wide association study (GWAS) 

DNA isolated from muscle samples collected at slaughter 
were genotyped for 63,153 SNPs using the lUumina Porci- 
neSNP60 beadchip [28] . Analysis was restricted to the au- 
tosomes. The genotype data were subjected to quality 
control (QC) through an iterative process performed using 
the GenABEL package in R 2.9.1 software [29,30]. The QC 
criteria for SNPs were call rates > 0.95 and minor allele 
frequencies (MAP) > 0.01. The QC criteria for individuals 
were call rates > 0.95, heterozygosity < 0.45 (1% false discov- 
ery rate (FDR)) and identity-by-state (IBS) < 0.95. After QC, 
44,648 autosomal SNPs and 938 individuals were included in 
the final analysis. SNP locations throughout the analysis are 
given according to the published draft pig genome sequence 
(SscrofalO.2: ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/ 
Eukaryotes/vertebrates_mammals/Sus_scrofa/Sscrofal0.2/) 



[31] and as available in Ensembl release 75 (http://www. 
ensembl.org/ Sus_scrofa/Info/Index) . 

Population stratification 

Genome wide association is based on differences in allele 
frequencies associated with differences in the trait under 
study. Phenomena such as admixture, selection and popu- 
lation stratification can result in spurious patterns of allele 
frequencies unrelated to the trait. Population stratification 
can be assessed by clustering individuals based on mea- 
sures of relatedness and examining clusters for evidence of 
systematic bias. Here, model based clustering was per- 
formed using the mclust function in R software 2.10.1 [30]. 
Mclust uses Bayesian information criterion (BIC) and an 
expectation maximisation algorithm (EM) to select the op- 
timal model and number of clusters. The best fit for the 
data was 3 elipsoidal clusters (Figure 1). Multi-dimensional 
scaling (mds) was applied to a distance matrix obtained as 




2 4 6 8 10 12 14 

Number of Clusters 



a 

» a % 




Figure 1 Visualization of population structure. Scree plot showing 
best fit shown by bend in curve is 3 clusters for the data (top). Plot of 
three clusters using co-ordinates from multi-dimensional scaling 
(bottom). Clusters are shown in green, red and blue. Individuals are 
assigned to clusters or groups based on degree of genetic relatedness. 
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a function of the weighted genomic relationship matrix. 
Multi-dimensional scaling returns a matrix with k columns 
whose rows give the coordinates of the points chosen to 
represent dissimilarities, k is a user defined parameter 
based on the expected number of clusters, here k = 3. The 
3 columns from the mds matrix were fitted into the linear 
model as covariates in order to account for the population 
stratification indicated by the model based clustering. 

The differences in study design between the two traits 
(i.e. skatole, androstenone) were reflected in the GWAS 
by very different estimates of lambda, which is an indica- 
tor of bias due to population structure. Lambda was 
close to 1 for all of the skatole analyses where high and 
low animals were matched sibs, but greater than 2 for 
the androstenone analyses. This result indicates that the 
sampling design for skatole was balanced and therefore, 
unaffected by potential biases arising from any popula- 
tion stratification. Although lambda indicates some bias 
for the androstenone analyses, this bias was largely 
accounted for with the inclusion of the co-ordinates 
from the multi-dimensional scaling (mds) matrix in the 
model (i.e. the inclusion of the mds matrix lowered the 
value of lambda from 2.0 to 1.3). Any remaining stratifi- 
cation was successfully corrected for by fitting the gen- 
omic relationship matrix. Full details are given in the 
materials and methods. 

Single SNP associations were performed using a 
GRAMMAR [29] analysis (LM 3) in GenABEL software. 
The results are summarized in Figure 2. Test statistics 
exceeding genome-wide significance were found on SSC14 
for skatole, and on SSC5 for androstenone. Further 
peaks on SSC13 and SSC17 exceed a genome-wide 5% 
FDR for effects on androstenone. Effects on skatole ex- 
ceeding nominal significance but not genome-wide sig- 
nificance were also seen on SSC3, SSC5, SSC6 and 
SSC8 (Table 2). 

Skatole 

The effect of the SIRI0000194 SNP at the telomeric end of 
SSC14 on skatole levels was highly significant {P < 1.4E-09) 
exceeding the genome-wide threshold (Figure 2) and 
explaining -5% of the phenotypic variance. This SNP lies 
within the CYP2E1 gene, which encodes an enzyme in- 
volved in the breakdown of skatole [32-34]. The next rank- 
ing SNP after the SNPs in LD with SIRI0000194 is the 
ASGA0039716 SNP on chromosome 8. The ASGA0039716 
SNP lies within the gene ^£72 or methylcytosine dioxygen- 
ase 2. There is no obvious connection between the func- 
tion of this gene or any other protein coding genes within 
1 Mbp of TET2 as currently annotated in the pig genome 
and skatole metabolism or storage. SNPs on chromosomes 
3, 5 and 6 also reach nominal significance. When we fitted 
SIRI0000194 as a fixed effect the ranking changed and 
MARC0040638 was the top ranking SNP (P < 0.001). 




I 1 — I — I — I — I — I — I — I — n—i — I 1 1 — n~n 

1 2 3 4 5 6 7 8 9 11 13 14 15 17 

Chromosome 




"T — I — r- 
4 5 6 



T — I — n-i — I 1 1 — i—m 

8 9 11 13 14 15 17 



Chromosome 

Figure 2 Manhattan plots for genome-wide association analysis 
for associations with skatole (top) and androstenone (bottom). 

Grammar method applied to eigliteen autosomes plus unassigned 
SNPs (far right in dark blue). Genome-wide significance thresholds 
dashed line 5% FDR cut off. Dotted line is genome-wide significance 
threshold set by 100,000 permutations. Results are based on corrected 
P values using lambda statistic to account for systematic bias. 



Androstenone 

A peak of genome-wide significant SNP effects on androste- 
none was seen on SSC5 (P < 6.8E-07) explaining 4% of the 
phenotypic variation (Table 2). Two SNPs H3GA0016037 
and ASGA0025097 mapping 4 Mbp apart are highly 
significant. Figure 3 shows the LD structure and genes 
around the SSC5 peak SNP for androstenone. LD be- 
tween the two SNPs is relatively high at r^ = 0.68 sug- 
gesting that both SNPs are tagging the same causal 
variant. There were also SNPs with large effects on chro- 
mosomes 8, 13 and 17 (Table 2). SSC13 and 17 exceeded 
the genome-wide false discovery rate. ALGA0073S94 on 
SSC13 does not map to any known gene. ASGA0095898 
on SSC17 lies within PTPRT or protein tyrosine phosphatase, 
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Table 2 Descriptive statistics for most significant SNP effects 



Chr 


SNP 


Pos (bp)§ 


P value 


SNP effect 


Proportion phenotypic variance 


Sig-fuir 










Skatole 






14 


SIRI0000194 


1 53,477,507 


1 40E-09** 


-0.26 


0.05 


1.66E-10 


8 


ASGA0039716 


125,083,628 


0.00029 


0.04 


0.001 


0.0018 


5 


ASGA0025182 


28,884,1 61 


0.00052 


0.12 


0.02 


0.0001 1 


3 


ALGA0020313 


103,881,028 


0.00082 


0.17 


0.01 


0.0006 


6 


MARC0040638 


4,515,061 


0.00144 


-0.13 


0.01 


0.00031 








Androstenone 






5 


H3GA0015037 


20,902,965 


6.82E-07** 


0.26 


0.04 


5.17E-07 


5 


ASGA0025097 


24,354,867 


3.51 E-06* 


0.28 


0.03 


2.03E-06 


17 


ASGA0095898 


50,429,537 


1 .08E-05* 


-0.52 


0.02 


0.0001 


13 


ALGA0073594 


203,892,414 


2.38E-05* 


-0.17 


0.02 


3.63E-05 


8 


ASGA00934S4 


80,694489 


0.0002 


-0.22 


0.02 


0.00024 



*exceeds 5% genome-wide false discovery rate **exceeds genome-wide significance threshold estimated from 100,000 permutations significance when tested in 
linear mixed model using ASRemI software. § SNP position in base pairs in the Sscrofa10.2 genome assembly. 



receptor type T and ASGA0093454 on SSC8 lies within 
the FH2 domain containing 1 gene. 

Autosomal heritablllty 

The linear mixed model (2) can be extended to divide 
phenotypic variance into estimates of the genetic and 
environmental variance containing information from ge- 
notypes of a group of N SNPs spanning a region. This 
method has been implemented in the GCTA software 
package and it has been shown that the method can be 
used to estimate genetic variation for any region of the 
genome [35]. We divided the pig genome into the 18 auto- 
somes and jointly estimated the contribution to heritability 
of androstenone (Figure 4, Additional file 2: Table SI) from 
each autosome (6). The total heritability summed over all 
autosomes was 0.29 for androstenone. As with the total 
heritability, the autosomal heritabilities for skatole will be 
specific to the genotyped subset and underestimated for 



the unselected population due to the study design. For 
this reason we have omitted the results on skatole 
from the main text, but these results can be found in 
Additional file 3. 

Individual LRT (likelihood ratio tests) for each chromo- 
some for androstenone are detailed in Table 3. These were 
derived by the LRTpoly test comparing a linear mixed 
model fitting systematic or fixed effects and a GRM 
based on information from all SNPs with a model in- 
corporating an additional variance component for the 
genetic variance attributable to all SNPs on a chromo- 
some. This provides a test of whether inclusion of individ- 
ual autosomes provides a better model of the variance 
than the overall relationship matrix (as might be the case 
if the individual chromosomes harbor a gene or genes of 
large effect on the trait). Estimates of the autosomal heri- 
tabilities for effects on androstenone for LRTpoly are sum- 
marised in Table 3. 
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Figure 3 LD decay from SNP H3G000016037 plotted against significance of effect on androstenone, pairwise LD in the region and 
genes located within the region. Sscrofa genome build 10.2. 
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Androstenone 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 
Chromosome 

Figure 4 Autosomal heritability or proportion of phenotypic 
variance explained for androstenone. *e5timate of heritability is 
larger than standard error All 18 autosomes were fitted simultaneously 
in a mixed linear model. 



For androstenone, the only autosome with a significant 
LRTpoly test for genetic variance was chromosome 5 
explaining 6% of the phenotypic variation, reflecting the 
GWAS results. Under the LRTpoly method, autosomes 
2, 3 and 13 each explain 2% of the phenotypic variation, 
however, the estimates are not significant. When all au- 
tosomes are fitted simultaneously (Figure 4) SSC2, SSC3 



and SSC13 explain 5%, 3% and 4% of the genetic variation. 
The sum of autosomal estimates of genetic variation from 
LRTpoly is 0.12 (Table 3). The genetic variation explained 
by fitting all autosomes simultaneously was 0.29 (Additional 
file 2: Table SI), indicating that LRTpoly is conservative as 
might be expected as part of the individual autosomal heri- 
tabilities are absorbed by the overall genomic polygenic 
effect. 

An alternative testing strategy is to fit all autosomes in 
a full model and then drop them one at a time for a re- 
duced model {LRT drop). A comparison of significance of 
autosomal heritability of androstenone using three test- 
ing strategies is given in Figure 5. Dropping a chromo- 
some from the model including all the autosomes 
provides a test for whether genetic variance is associated 
with that particular chromosome whilst accounting for 
background polygenic effects on other chromosomes. This 
contrasts with the model containing only a single chromo- 
some {LRTind in Figure 5) where the LRT and variance 
explained may be inflated by genetic variance from the 
rest of the genome that is not explicitly included in the 
model. For androstenone the results for LRTdrop suggest 
that chromosomes 2, 3, 5 and 13 explain a significant pro- 
portion of the variance. 



Table 3 Estimates of autosomal heritability for 
androstenone 



Chr 


l2 

nautosome 


se 


p-val 


l2 

npolygenic 


se 


1 


0 


0.04 


1 


0.38 


0.06 


2 


0.02 


0.02 


0.16 


0.33 


0.06 


3 


0.02 


0.02 


0.16 


0.34 


0.06 


4 


0 


0.02 


1 


0.36 


0.06 


5 


0.06 


0.03 


0.00051 


0.29 


0.06 


6 


0 


0.03 


1 


0.37 


0.06 


7 


0 


0.02 


1 


0.35 


0.06 


8 


0 


0.02 


1 


0.35 


0.06 


9 


0 


0.02 


1 


0.37 


0.06 


10 


0 


0.02 


1 


0.35 


0.06 


11 


0 


0.02 


1 


0.35 


0.06 


12 


0 


0.02 


0.38 


0.35 


0.06 


13 


0.02 


0.02 


021 


0.33 


0.06 


14 


0 


0.02 


1 


0.36 


0.06 


15 


0 


0.02 


0.36 


0.35 


0.06 


16 


0 


0.02 


1 


0.35 


0.06 


17 


0 


0.02 


1 


0.35 


0.06 


18 


0 


0.01 


1 


0.36 


0.06 



Testing strategy was to compare fitting a random polygenic effect (based on a 
GRIVl estimated using all genotyped SNPs across the genome) plus a random 
effect for variance attributed to SNPs from a single autosome with a reduced 
model fitting only the random polygenic effect. P-val is the corresponding p 
value based on the distribution of the LRT being between Xi and a point mass 
of zero, h^ autosome is an estimate of the heritability of the autosome, h^ 
polygenic is an estimate of the heritability from the entire genome. 



Discussion 

A genome-wide association study (GWAS) was carried 
out to identify SNPs associated with effects on andros- 
tenone and skatole in intact male pigs. The effect of the 
SIRI0000194 SNP on skatole estimated by fitting the 
genotypes as a covariate in the linear mixed model (3) 
was 5% of the phenotypic variance of the selected 
population (Table 2). The expectation in the general 




9 10 11 12 13 14 15 16 17 18 
Autosome 

Figure 5 Likelihood ratio test (LRT) for significance of autosomal 
heritability or proportion of phenotypic variance explained for 
androstenone using three different linear mixed models. LRTind is 
comparing a model fitting an individual autosome with a null model, 
LRTdrop is where all autosomes are fitted and compared with a model 
which drops each autosome in turn, LRTpoly is comparing a model 
fitting an individual autosome plus a polygenic effect with a model 
containing only a polygenic effect 
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population assuming a heritability of 0.4 is that it would 
explain -12.5% of the genetic variation. The SIRI0000194 
SNP, which was reported previously as AJ697882_2412 
[32], is located within the promoter of the CYP2E1 gene. In 
a small separate sample of 83 Danish pigs significantly 
more AJ697882_2412 {SIR10000194) CC homozygotes were 
observed in the 'high' skatole group [32]. More recently as- 
sociations between skatole levels in two Duroc populations 
and the AJ697882_2412 {SIRI0000194) SNP have been re- 
ported [33]. Again the CC homozygotes exhibited the high- 
est skatole levels. Although SIRI0000194 lies within a block 
of high LD (Figure 6) spanning several other genes there is 
evidence to support CYP2E1 as a candidate for the gene 
responsible for the observed associations with skatole 
levels. This gene has been previously identified as a 
candidate and is involved in the degradation of skatole 
in the liver where it is solely and abundantly expressed 
[36] (see also (http://biogps.org). 

The GWAS for skatole was repeated, fitting the SNP 
SIRI000094 into the linear mixed model as a fixed effect. 
This model resulted in a change of ranking among the 
SNPs. The effect of greatest significance (P < 0.001) was 
associated with SNP marker MARC0040638 located on 
chromosome 6 within the estradiol 17-beta-dehydrogenase 
2 (HSD17B2) gene. The HSD17B2 gene and MARC0040638 
SNP were located at SSC6:4,514,200-4,578,665 in an earlier 
genome assembly (Sscrofa9) but are located on unplaced 
scaffolds on the present assembly (SscrofalO.2). The as- 
signment of MARC0040638 SNP to SSC6 is confirmed 
from radiation hybrid mapping data (Additional file 2: 
Table SI in [37]). Both the MARC0040638 SNP and 
HSDl 7B2 gene are present in the sequence of the CH242- 
77H3 BAG clone (Genbank accession: GU929847). Incom- 
plete sequence data from this BAG clone contribute to the 
current pig genome assembly (SscrofalO.2) on SSG6 6.876- 
6.939 Mbp. This SNP did not exceed the FDR or genome- 
wide threshold, however a region on chromosome 6 



spanning this gene was previously found to be significant 
for skatole in Landrace pigs [18]. Ramos et al., [18] found 
significant associations between skatole levels in Duroc 
pigs and SNPs mapping to a 6 Mbp region on SSG6 corre- 
sponding to 1.829-8.498 Mbp in SscrofalO.2 coordinates 
and thus including the MARC0040638 SNP and HSD17B2 
gene. In an earlier study, we mapped QTL for skatole, 
as detected by a (human) sensory panel, by linkage ana- 
lysis with a low density microsatellite marker map with 
the closest marker SW1353 mapping to SSG6: 9.872 Mbp 
(SscrofalO.2 coordinates) [13]. Human estradiol 17-beta- 
dehydrogenase 2 (HSD17B2) is involved in the synthesis of 
the 17 beta-hydroxysteroids: delta 5-androstene-3 beta, 17 
beta-diol, testosterone, 17 beta-estradiol and dihydrotes- 
tosterone [38]. The HSD17B2 gene is thus important for 
steroid hormone synthesis and is abundantly expressed 
in pig liver, ureter and stomach (fundus), [36] (see also 
(http://biogps.org). Another 17-beta hydroxysteroid dehyr- 
dorgenase gene (HSD17B7) has been examined as a candi- 
date gene for an androstenone QTL on SSG4 [39]. 

A significant effect on androstenone was found associ- 
ated with the H3GA0016037 SNP on chromosome 5 
explaining ~4% of the phenotypic variance. H3GA0016037 
lies between the gene encoding transcription factor NEU- 
R0D4 neurogenic differentiation 4 and the TESPAl thymo- 
cyte expressed positive selection association 1 locus. In 
humans TESPAl is involved in the selection of thymocytes 
and T-cell development. It has been hypothesised that the 
production of glucocorticoid steroids may in some way 
regulate thymocyte selection [40]. The second most sig- 
nificant GWAS result was for ASGA002S097 which is 
located ~4 Mbp distal to the H3GA0016037 SNP on 
chromosome 5. The genes of interest located within 
this 4 Mbp region include the retinol dehydrogenase 5 
(RDH5) and retinol dehydrogenase 16 (RDH16) genes. The 
RDH gene encodes an enzyme which recognizes 5a- 
androstan-3a,17P-diol and androsterone as substrates and 
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is expressed in liver, testes and other tissues in humans 
[41]. RDH16 is abundantly expressed in pig liver, testes 
and placenta [36] (see also http://biogps.org). Another 17- 
beta hydroxysteroid dehyrdrogenase gene {HSD17B6) is lo- 
cated about 0.5 Mbp upstream of the ASGA0025097 SNP. 
The 4 Mbp region between the two top SNPs is gene-rich 
and exhibits high levels of LD in the Danish Landrace 
population studied (Figure 3). Ironically, many of the genes 
in this region encode olfactory receptors. The minor allele 
frequency for both SNPs (ASGA002509Z H3GA0016037) 
was 0.14 and the r^ between them was 0.68. Fitting either 
SNP results in the loss of the effect indicating that both 
SNPs are tagging the same causal variant. This region has 
been found to be significant for androstenone measured in 
the fat of Duroc pigs, and for estradiol in Landrace pigs 
[16], however this region has not previously been found to 
be significant for effects on androstenone levels in Landrace. 

Results from the regional heritability study reflected 
the GWAS analysis with the greatest heritability for 
androstenone on chromosome 5. This indicates that the 
regional approach successfully identifies autosomes with 
genetic variation attributable to the trait and that genetic 
variance is not correlated to the length of autosomes as 
seen by Yang et al. [24]. Here, the correlation of variance 
explained, with length of autosome, was 0.02 (P < 0.93) 
for androstenone. There was evidence of information be- 
yond the GWAS results from the regional approach. 
The method did point to an association of SSC2 and 
SSC3 with androstenone not seen in the GWAS. Highly 
significant effects for multiple QTL on these chromo- 
somes associated with androstenone have been previ- 
ously reported [11-13,16]. We cannot ascertain whether 
the SNP effect on SSC17 associated with androstenone 
is undetected by the regional approach or a spurious 
artifact of the GWAS. One approach might be using se- 
quence information for imputation to increase the num- 
ber of SNP genotypes and subsequently to divide the 
genome into many smaller regions providing greater 
resolution. Combined results of multiple SNP genotypes 
are less likely to yield spurious results from anomalies 
such as population stratification and differing minor allele 
frequencies at individual SNPs. The autosomes explaining 
the most variation have a greater lil<elihood for housing 
putative candidate genes and pathways. A further use for 
the estimated SNP or region effects in this population 
could be genomic prediction in unphenotyped individuals. 
This potential application is of particular relevance in 
traits that can only be measured post slaughter such as 
boar taint where phenotypes are of high economic impact 
and could result in rejection of the entire carcass. 

Conclusions 

Significant associations were found for skatole on SSC14 
and for androstenone on SSC5 in Landrace pigs. The 



study agrees with a body of evidence that the CYP2E1 
gene has effects on skatole breakdown in the liver. Auto- 
somal heritability estimates agree with the GWAS and 
provide an opportunity to identify regions for further 
study. Differences between the GWAS and the auto- 
somal heritability suggest that for androstenone there is 
variation explained by SSC2 and SSC3 that is not de- 
tected by the GWAS and that the SNP on chromosome 
17 does not appear to contribute variance at the level of 
the autosome. 

Methods 

Animals 

All the animals involved in this study were raised under 
conventional pig production conditions and were not 
subjected to any experimental procedures. All the sam- 
ples for the study were collected post-mortem in a com- 
mercial abattoir. 

Taint measures 

Tissue fat samples were assayed for skatole levels using a 
calorimetric method in-house at the abattoir [42]. A second 
tissue sample taken about an hour after slaughter was subse- 
quently assayed for androstenone by the Norwegian School 
of Veterinary Science using a modified time-resolved fluor- 
oimmunoassay [43]. 

l-ieritabilities 

A fixed effect of herd; and significant covariates meat per- 
centage, slaughter weight and age at slaughter, were esti- 
mated using a linear mixed model in software package 
ASReml2 [44] (1). Fixed effects and covariates for skatole 
were estimated using the entire population of 6,000 ani- 
mals in order to achieve the greatest possible accuracy. 
Heritabilities were estimated using pedigree relationships 
in the entire population of 6,000 individuals for Skatole 
and the 1,000 individuals pheno typed for androstenone. 

Y = Xp + Zu+e (1) 

Where Y is an « x 1, vector of log phenotype, « is the 
number of individuals, X is an incidence matrix relating so- 
lutions for fixed effects of herd and covariates of age, mds 
co-ordinates contained in (} to individuals, u is an « x 1 
vector of genetic effects, Z is an k x « incidence matrix re- 
lating individuals to genetic effects, and e is an « x 1 vector 
of individual residual effects. u~N(0; Acr^J , and e is dis- 
tributed as e ~ N(0, lo^e)- A is the n x n genetic relationship 
matrix estimated from pedigree relationships. 

Genomic relationship matrices 

SNP genotypes were used to estimate shared coancestry 
or identity by state between individuals with rare SNPs 
weighted more heavily. The n x n genomic relationship 
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matrix (GRM) of relatedness at a population level between 
n individuals gives the covariance structure for the pheno- 
type based on the premise that the more related two indi- 
viduals are, or the greater the amount of the genome they 
share in common, the greater the expectation of pheno- 
typic similarity. The proportion of alleles two individuals 
share in common are summed across all markers weighted 
by allele frequencies in the population in order to obtain 
an accurate estimate of how related two individuals are 
either across the entire genome or at a given region. Gen- 
omic relationship matrices were estimated using GenABEL 
[29] and GCTA [35] software. 

Using the marker information for the 1,000 individ- 
uals, heritabilities were estimated by fitting the SNP 
based genomic relationship matrix from GenABEL in a 
linear mixed model to estimate polygenic effects from 
marker information (2). A genotypic correlation was es- 
timated by a bivariate analysis of the two traits fitting 
the genomic relationship matrix using ASReml 2 soft- 
ware [44]. 

Y = Xp + Wg + e (2) 

Where g is an A/ x 1 vector of SNP effects, N is the num- 
ber of SNPs, W is an « X TV incidence matrix relating SNP 
genotypes to g. G is the n x n genomic relationship matrix 

estimated from SNP genotypes and g~N ^0, Gfff^ ■ 
Association analysis 

Single SNP association tests were performed using a GRAM 
MAR [29] analysis (3) in GenABEL software. GRAMMAR 
uses a score test to identify associations between SNP 
genotypes and trait residuals after fixed and background 
genetic or polygenic effects are accounted for in the linear 
mixed model (2). Polygenic effects were estimated using a 
grm estimated from the average relationship between indi- 
viduals at all SNP markers (weighted by allele frequency) 
across the genome. 

y = SNP + e (3) 

X = Median (T^ , , , T^) /0.456 (4) 

Where T is test statistic for N SNPs from (3). 

Where y is a vector of trait residuals from (2), SNP is 
a vector of SNP genotypes and e is a vector of random 
residuals. 

A correction factor or lambda [29,45] was estimated 
from the distribution of test statistics to further account 
for systematic bias (4). A factor greater than 1 is indica- 
tive of systematic inflation of test results when compared 
to a distribution of the expectation under the null hypoth- 
esis. A factor less than one often results from over correc- 
tion in a grammar analysis. The grammar function in 
GenABEL adjusts for this deflation factor. Permutation 



analysis (100,000) was used to determine a rigorous 
threshold for genome-wide significance accounting for 
multiple testing and for any unaccounted for systematic 
bias. A less rigorous FDR cut off of <0.05 was applied to 
report SNPs of interest to aid the comparison of results 
from past and future study populations. 

As grammar analyses tend to underestimate true SNP 
effects [29], genome-wide significant SNPs identified 
with the grammar analysis were fitted individually as co- 
variates in the linear mixed model using ASReml 2 soft- 
ware to estimate SNP effects and to verify significance 
(5). The additive genetic variance was estimated as 
2p{l-p)a^ where p is the allele frequency for the most 
common SNP allele and a is the estimated effect. A fur- 
ther check was that this estimate was consistent with 
the difference in phenotypic variance when fitting, and 
not fitting, SNP genotype as a covariate in the LMM. 

Y = /4 + herd + bl * SNP + b2 * slaughter weight 

+ b3 * age + b4 * meat percentage 

+ mds + a + e (5) 

Where y = log trait. Herd is fitted as a fixed effect. SNP 
genotype, slaughter weight, age, meat percentage and co- 
ordinates from the multi-dimensional scaling (mds) are fit- 
ted as covariates, a is a random polygenic effect estimated 
using a SNP-based relationship matrix and e is the random 
residual. 

Estimation of regional genetic contribution or 'autosomal 
heritability' 

The linear mixed model (2) can be extended to divide 
phenotypic variance into estimates of the genetic and 
environmental variance containing information from ge- 
notypes of a group of N SNPs spanning a region. This 
method has been implemented in the GCTA software 
package and it has been shown that the method can be 
used to estimate genetic variation for any region of the 
genome [35]. We divided the pig genome into the 18 au- 
tosomes and estimated the contribution to heritability 
from each autosome (6). For these analyses only SNPs 
that mapped to Sscrofa 10.2 were used, any SNPs with- 
out a position on the current assembly were omitted as 
they could not be assigned to an autosome. Omitting 
these SNPs (-13% of all SNPs) from the GRM made 
very little difference to the estimate of total genetic vari- 
ance. The heritability estimate dropped by 0.0065. This 
indicates that this subset of annotated SNPs was suffi- 
ciently large enough to accurately estimate relationships 
between individuals and to capture the genetic variance. 

18 

Y = xp + ^ Wu,,,, + e (6) 

chr—l 
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To avoid confounding of genetic variation of the trait 
and potential variation due to population stratification, 
eigenvectors were estimated from the genetic relation- 
ship matrix and the first 4 principal components fitted 
as covariates in the linear mixed model. This is slightly 
conservative and based on the results of the model based 
clustering described earlier which showed that the data 
forms 3 distinct clusters. The fixed effects and covariates 
of herd, age, meat percentage and slaughter weight were 
fitted into a linear mixed model together with eighteen 
variance components - one for each of the eighteen auto- 
somes requiring 18 separate genetic relationship matrices 
to model the covariance structure and to partition the 
genetic variance into estimates of autosomal heritability. 

To test the significance of individual autosomes a like- 
lihood ratio test (LRT) comparing a model fitting the in- 
dividual autosome plus a variance component for all 
SNPs in the grm (i.e. the equivalent of a genomic poly- 
genic effect) was compared to a model fitting only the 
polygenic effect [LRT poly). All SNPs were used in the 
polygenic effect to ensure that the models were truly 
nested. This conservative approach ensures that the vari- 
ance explained by an autosome is not inflated by back- 
ground polygenic effects. 

Two further approaches were used. Firstly, comparing 
a model fitting a variance component estimated from 
the SNPs on a single autosome with a null model 
(LRTind). Secondly, a model fitting all 18 variance com- 
ponents compared with a model dropping each of the 
autosomes in turn {LRT drop). 

GCTA solves the linear mixed model (LMM) and ob- 
tains estimates of genetic and residual variances by re- 
stricted maximum likelihood (REML) using the average 
information (AI) algorithm. 

A test statistic was obtained using a standard LRT stat- 
istic calculated as twice the difference between the log 
likelihoods of the full model and the null or reduced 
model that did not fit a genetic component. The LRT 
was tested against a chi square distribution. The LRT for 
one extra variance component is distributed as a mixture 
of point 0 and Idegrees of freedom (df) [46]. To account 
for this a P-value for a test assuming Idf was divided in 
two. 
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